-
Notifications
You must be signed in to change notification settings - Fork 37
adds formatter option for preserving comments (redo) #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a PreserveComments option to SqlScriptGeneratorOptions to enable the script generator to preserve T-SQL comments (both single-line -- and multi-line /* */) in the generated output. This addresses issue #20 which requested comment preservation for code formatting/pretty-printing scenarios.
Changes:
- Adds
PreserveCommentsboolean option toSqlScriptGeneratorOptionswith default valuefalsefor backward compatibility - Implements comment tracking and emission logic in
SqlScriptGeneratorVisitor.Comments.cs - Adds helper classes
CommentInfoandCommentPositionto manage comment metadata - Modifies
TSqlScriptandTSqlBatchvisitors to call comment emission logic before/after fragments - Adds comprehensive unit tests in
ScriptGeneratorTests.csto verify comment preservation behavior - Adds baseline test scripts
SingleLineCommentTests170.sqlandMultiLineCommentTests170.sql
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| SqlScriptGeneratorOptions.xml | Adds PreserveComments boolean setting with false default |
| SqlScriptGeneratorVisitor.Comments.cs | Core implementation of comment tracking and emission logic |
| SqlScriptGeneratorVisitor.TSqlScript.cs | Integrates comment handling at script level |
| SqlScriptGeneratorVisitor.TSqlBatch.cs | Integrates comment handling at batch and statement levels |
| CommentInfo.cs | Helper class to track comment metadata |
| CommentPosition.cs | Enum defining comment positions (Leading/Trailing/Standalone) |
| ScriptGeneratorTests.cs | 13 comprehensive unit tests for comment preservation |
| Only170SyntaxTests.cs | Adds comment test scripts to baseline test suite |
| SingleLineCommentTests170.sql | Test script with single-line comments |
| MultiLineCommentTests170.sql | Test script with multi-line comments |
| Baselines170/SingleLineCommentTests170.sql | Expected output with comments stripped (default behavior) |
| Baselines170/MultiLineCommentTests170.sql | Expected output with comments stripped (default behavior) |
| foreach (var comment in trailingComments) | ||
| { | ||
| // Find the index of this comment token | ||
| for (int i = newLastIndex + 1; i < _currentTokenStream.Count; i++) | ||
| { | ||
| if (_currentTokenStream[i] == comment.Token) | ||
| { | ||
| newLastIndex = i; | ||
| break; | ||
| } | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token lookup logic uses reference equality comparison (_currentTokenStream[i] == comment.Token) which is inefficient and potentially fragile. Instead of linear search through tokens on each comment, consider tracking token indices directly in CommentInfo. Store the actual token index from the stream when creating CommentInfo, then update _lastProcessedTokenIndex directly from that stored value. This would eliminate the nested loop and improve performance when there are many trailing comments.
| for (int i = startIndex; i < endIndex && i < _currentTokenStream.Count; i++) | ||
| { | ||
| var token = _currentTokenStream[i]; | ||
| if (IsCommentToken(token)) | ||
| { | ||
| comments.Add(new CommentInfo(token, CommentPosition.Leading, fragment.FirstTokenIndex)); |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CommentInfo constructor receives 'fragmentIndex' parameter but the comment gathering methods (GetLeadingComments, GetTrailingComments) don't pass the actual token index of the comment itself. The fragmentIndex parameter should be renamed to clarify it represents the associated fragment's index, not the comment's own token index. Better yet, add a TokenIndex property to CommentInfo and populate it with the actual index (i) when creating CommentInfo objects in GetLeadingComments and GetTrailingComments. This would enable efficient token tracking without the nested loop in AfterVisitFragment.
| #region Comment Tracking Fields | ||
|
|
||
| /// <summary> | ||
| /// Tracks the last token index processed for comment emission. | ||
| /// Used to find comments between visited fragments. | ||
| /// </summary> | ||
| private int _lastProcessedTokenIndex = -1; | ||
|
|
||
| /// <summary> | ||
| /// The current script's token stream, set when visiting begins. | ||
| /// </summary> | ||
| private IList<TSqlParserToken> _currentTokenStream; |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _lastProcessedTokenIndex and _currentTokenStream fields are instance fields of SqlScriptGeneratorVisitor without any thread synchronization. If the same SqlScriptGenerator instance is used concurrently from multiple threads (which is not a typical pattern but is possible), these fields could be corrupted. Consider documenting that SqlScriptGenerator instances are not thread-safe and should not be shared across threads, or add appropriate thread-safety mechanisms.
| <Summary>Gets or sets the number of newlines to include after each statement</Summary> | ||
| </Setting> | ||
| <Setting name="PreserveComments" type="bool" default="false"> | ||
| <Summary>Gets or sets a boolean indicating if comments from the original script should be preserved in the generated output</Summary> |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PreserveComments setting has a clear, descriptive summary. However, consider adding more detailed documentation about limitations: (1) only statement-level and batch-level comments are preserved, not comments within expressions or clauses; (2) the feature requires the token stream to be available (ScriptTokenStream must be non-null). This would help developers understand what to expect when using this option.
| <Summary>Gets or sets a boolean indicating if comments from the original script should be preserved in the generated output</Summary> | |
| <Summary>Gets or sets a boolean indicating if comments from the original script should be preserved in the generated output. Only statement-level and batch-level comments are preserved (not comments within expressions or clauses), and comment preservation requires the script token stream to be available (ScriptTokenStream must be non-null).</Summary> |
| Trailing, | ||
|
|
||
| /// <summary> | ||
| /// Comment not directly associated with code (e.g., end of file, standalone block). | ||
| /// </summary> | ||
| Standalone |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CommentPosition.Standalone enum value is defined but never used in the codebase. All comments are classified as either Leading or Trailing. If Standalone comments are not needed for this implementation, remove this enum value to avoid confusion. If they are needed for future functionality (e.g., orphaned comments at the end of a script), add a comment explaining the intended use case.
| Trailing, | |
| /// <summary> | |
| /// Comment not directly associated with code (e.g., end of file, standalone block). | |
| /// </summary> | |
| Standalone | |
| Trailing |
| // Stop if we've gone past the same line (unless it's whitespace with no newline) | ||
| if (token.Line > lastTokenLine) | ||
| { | ||
| break; | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetTrailingComments method only captures comments on the same line as the fragment's last token (checking token.Line > lastTokenLine). This means comments that appear on subsequent lines after a statement, but before the next statement, will be missed and not emitted. These "orphaned" comments would typically be captured as leading comments of the next fragment, which is acceptable. However, comments at the very end of the script (after the last statement) would not be captured at all unless TSqlScript.LastTokenIndex extends beyond the last statement to include them. Consider documenting this behavior or adding a test case to verify end-of-script comments are handled correctly.
| -- Comment between statements | ||
| SELECT 3; | ||
| -- Another comment | ||
| SELECT 4; |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test scripts (SingleLineCommentTests170.sql and MultiLineCommentTests170.sql) don't include any comments after the final statement. This is a gap in test coverage that should verify comments at the end of the script are properly preserved. Add test cases with trailing comments after the last SELECT statement to ensure end-of-script comments are handled correctly.
| public void TestPreserveCommentsEnabled_CreateTable() | ||
| { | ||
| // Test comments with CREATE TABLE statement | ||
| var sqlWithComments = @"-- Table creation comment | ||
| CREATE TABLE TestTable ( | ||
| -- Primary key column | ||
| Id INT PRIMARY KEY, | ||
| -- Name column | ||
| Name NVARCHAR(100) | ||
| );"; | ||
| var parser = new TSql170Parser(true); | ||
| var fragment = parser.Parse(new StringReader(sqlWithComments), out var errors); | ||
|
|
||
| Assert.AreEqual(0, errors.Count); | ||
|
|
||
| var generatorOptions = new SqlScriptGeneratorOptions | ||
| { | ||
| PreserveComments = true | ||
| }; | ||
| var generator = new Sql170ScriptGenerator(generatorOptions); | ||
| generator.GenerateScript(fragment, out var generatedSql); | ||
|
|
||
| // Table creation comment should be preserved | ||
| Assert.IsTrue(generatedSql.Contains("-- Table creation comment"), | ||
| "Table creation comment should be preserved"); | ||
|
|
||
| // Verify comment appears BEFORE the CREATE keyword (correct positioning) | ||
| int commentIndex = generatedSql.IndexOf("-- Table creation comment"); | ||
| int createIndex = generatedSql.IndexOf("CREATE", StringComparison.OrdinalIgnoreCase); | ||
| Assert.IsTrue(commentIndex < createIndex, | ||
| $"Table comment should appear before CREATE. Comment at {commentIndex}, CREATE at {createIndex}"); | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage doesn't include comments within complex SQL constructs such as: comments inside column definitions (CREATE TABLE), comments within CASE expressions, comments in JOIN clauses, or comments in WHERE clause predicates. Add test cases to verify that comments embedded within these constructs are preserved correctly.
| public override void ExplicitVisit(TSqlBatch node) | ||
| { | ||
| // Emit leading comments before the batch | ||
| BeforeVisitFragment(node); | ||
|
|
||
| foreach (TSqlStatement statement in node.Statements) | ||
| { | ||
| // Emit leading comments before each statement | ||
| BeforeVisitFragment(statement); | ||
|
|
||
| GenerateFragmentIfNotNull(statement); | ||
|
|
||
| GenerateSemiColonWhenNecessary(statement); | ||
|
|
||
| // Emit trailing comments after each statement | ||
| AfterVisitFragment(statement); | ||
|
|
||
| if (statement is TSqlStatementSnippet == false) | ||
| { | ||
| NewLine(); | ||
| NewLine(); | ||
| } | ||
| } | ||
|
|
||
| // Emit trailing comments after the batch | ||
| AfterVisitFragment(node); | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment preservation logic only calls BeforeVisitFragment/AfterVisitFragment on TSqlScript, TSqlBatch, and TSqlStatement nodes. Comments embedded within sub-expressions (e.g., inside CASE expressions, JOIN clauses, column definitions, or WHERE predicates) will not be preserved because these fragment types don't have the Before/AfterVisitFragment calls. Either document this limitation clearly in the PreserveComments property documentation, or expand the implementation to call these methods for all fragment types (which would require modifying the visitor pattern significantly).

Description
resolves #20
enables the code formatter to preserve T-SQL comments, including
Code Changes